“Big Data” and the analytics market are expected to reach $125 billion worldwide in 2015. Eighty percent of enterprises and sixty three percent of small and medium business already have deployed or are planning to deploy Big Data projects in the near future. But the vast majority of these projects are based around structured data. It is estimated that eighty percent of an organization's data is unstructured or only semi-structured, however. And a significant portion of that unstructured and semi-structured data is documents. Today, just because organizations are applying analytics tools around their structured data, does not mean their unstructured and semi-structured documents have gone away. They have been, and will continue to be, an important aspect of an organization's data.
Semi-structured and unstructured documents are often voluminous. Such documents can consist of hundreds of individual papers. For example, a purchaser's mortgage document can be stored as a single 500-page document, which consists of individual papers such as the purchaser's income tax return(s), W-2(s), and credit report, the appraiser's report, and so forth, bundled together in the mortgage document. Each purchaser is associated with a different mortgage document. Thus, the size and volume of documents can be very large. Documents may be stored across various storage systems and/or devices and accessed by multiple departments and individuals. Documents may include different types of information and have various formats. They are used in many applications including mortgages and lending, healthcare, land environmental, and so forth, and they are fed by multiple sources like social networks, server logs, and information from banking transactions, web content, GPS trails, financial market data, etc.